Hi, we're going to discuss exercise sheet three today.
Exercise number six will be about proving how we can alternatively characterize taken
effect reconstruction by solving a quadratic optimization problem and solving the linear
normal equation respectively.
So the generalized Tichonov regularized solution is the argument of the following quadratic
functional one half au minus f squared plus lambda half lu squared.
I'm going to write this as f of u and it's easy to see that f of u can also be written
as one half u transpose a transpose au plus lambda half u transpose l transpose lu minus
f, no that's why it's so differently, u transpose a transpose f plus one half f transpose f.
And the minimizer of a quadratic functional is given by the unique, well unique if the
matrices are spd, so by the root of the gradient of this quadratic function.
So we just have to compute the gradient of f in u and by taking the gradient of this
quadratic expression, so this is something that you have probably learned in some calculus
class this is a transpose au plus lambda l transpose lu minus a transpose f.
Now you see that the gradient of f in u is equal to zero if and only if this linear equation
holds which means that a transpose a plus lambda l transpose l as a whole applied to
a is equal to a transpose f.
That is exactly the normal equation for generalize Tiharov.
Exercise seven.
This is about the gradient descent algorithm, so the gradient descent algorithm has this
form xn plus one is equal to xn minus some step size alpha n times the gradient of a
functional in xn. This leads to the minimizer of phi of x at least in a lot of steps, at
least to a local minimum.
And what if phi of x now has a specific form one half x transpose bx minus b transpose
x. So in particular this could be such a quadratic like this so a generalized Tiharov functional
is a specific form of that we just have to put b equal to a transpose a plus lambda l
transpose l and lowercase b equal to a transpose f and that would exactly lead to the same
form.
And for this quadratic we would have the gradient of phi is equal to bx minus b.
So this means we know what the we also call this the residual R.
And so the specific form of gradient descent for quadratic functionals is something that
we discussed in quite some detail a few lectures ago and the specific form here was that we
take xn we subtract of course Rn which is the sorry plus Rn, Rn is the negative gradient
and the step size is then Rn transpose Rn divided by Rn transpose b times Rn.
This is the correct step size to choose.
And the goal of this exercise here is to prove that if R0 so the initial residual, the residual
at the first iterate and that can be written as b minus bx0 if that is an eigenvector of
b which means that b R0 is lambda R0 for some lambda positive then well lambda could be
zero but we won't discuss this case.
If that is the case then x1 which is the first iterate will be equal to the minimizer of
the function phi.
So instead of doing this zigzag thing which you typically see for gradient descent starting
with the right initialization will give you a gradient descent algorithm that converges
in just one step which is ideal of course.
Now how can we prove this?
The first step is to show that what can be what do we know here so we know that R0 is
the same as minus b E0 and this particularly implies that b minus 1 R0 is equal to minus
E0.
Now because of this relationship implies that R0 is equal to lambda zero b minus 1 R0 this
means that also b sorry that R0 is also an eigenvector with respect to b minus 1.
Presenters
Zugänglich über
Offener Zugang
Dauer
00:43:39 Min
Aufnahmedatum
2022-01-18
Hochgeladen am
2022-01-18 15:36:04
Sprache
en-US